18 research outputs found

    Syntax-semantics interactions – seeking evidence from a synchronic analysis of 38 languages [version 1; peer review: 1 approved]

    Get PDF
    The notion that, to facilitate processing, as semantic complexity increases, syntactic complexity decreases, follows from various linguistic theories. This brief report presents the results of testing that notion, by analysing synchronic data from 38languages and correlating canonical measures of semantic and syntactic difficulty. We expected an overall positive tendency. However, the results came out mixed to negative. There is a notable degree of variation and there are no clear tendencies within language families. After detailing the theoretic and cognitive reasons that support the original hypothesis, we conclude with a short discussion about the potential causes and implications of our findings. A possible interpretation is that the interaction we are looking for is more subtle than one might have assumed

    Registerial adaptation vs. innovation across situational contexts : 18th century women in transition

    Get PDF
    Endeavors to computationally model language variation and change are ever increasing. While analyses of recent diachronic trends are frequently conducted, long-term trends accounting for sociolinguistic variation are less well-studied. Our work sheds light on the temporal dynamics of language use of British 18th century women as a group in transition across two situational contexts. Our findings reveal that in formal contexts women adapt to register conventions, while in informal contexts they act as innovators of change in language use influencing others. While adopted from other disciplines, our methods inform (historical) sociolinguistic work in novel ways. These methods include diachronic periodization by Kullback-Leibler divergence to determine periods of change and relevant features of variation, and event cascades as influencer models.Peer reviewe

    Finding the Neural Net: Deep-learning Idiom Type Identification from Distributional Vectors

    Get PDF
    The present work aims at automatically classifying Italian idiomatic and non-idiomatic phrases with a neural network model under constrains of data scarcity. Results are discussed in comparison with an existing unsupervised model devised for idiom type detection and a similar supervised classifier previously trained to detect metaphorical bigrams. The experiments suggest that the distributional context of a given phrase is sufficient to carry out idiom type identification to a satisfactory degree, with an increase in performance when input phrases are filtered according to human-elicited idiomaticity ratings collected for the same expressions. Crucially, employing concatenations of single word vectors rather than whole-phrase vectors as training input results in the worst performance for our models, differently from what was previously registered in metaphor detection tasks

    Textual Alignment and Semantic Analysis of the Homeric Poems and selected Italian Translations between the XVIII and the XXI century

    No full text
    The aim of this work is both to build a program which automatically aligns the original Homeric poems with the Italian translations of them - literary and free translations included – produced over a span of time that goes from the XVIII to the XXI century and to show what kind of analysis this alignments could allow. Through time, translations have changed trying at the same time to respect the text and to adapt to the aesthetic paradigms of the epoch they belonged to and of the translator himself. After a brief history of Italian translations of Homer, where I give a chronological account of the principal Italian translations of the Homeric poems between the XIV and the XXI centuries, I develop the two main parts of my work. In Part I, I explain the working principles of the textual aligner. After a summary of the state of the art in textual alignment in section 1.1 and an explanation of the reasons that drove me to chose proper names as anchor words (section 1.2), I proceed to give a detailed explanation of the program's mechanics in three sections. In section 2.1 I give an overview of the algorithm in its main steps; in section 2.2 I explain in detail how the text is segmented and how the anchor words are extracted and paired; in section 2.3 I summarize the principles of the Needlemann-Wunsch algorithm; in section 2.4 I explain the mechanisms of the post-processing phase, where the alignment results are refined and enhanced. Some examples of the behavior of the aligner on different kind of translations are given in section 2.5. Section 2.6 gives a very brief account on the performance of the aligner for translations in different European languages. Part II is devoted to the analysis of Italian translations of Homer. Sections 3.1 and 3.2 supply the state of the art and an explanation of the fundamental principles of distributional semantics. To analyze Italian translations, I chose a set of Ancient Greek terms and a set of their Italian translations and I studied the similarity of those terms both in the Ancient Greek and Italian texts. Section 4.1 presents the selected terms and explains how the Ancient Greek words were chosen. To find their most diffused Italian counterparts I used both manual inspection and a method of automatic extraction to which section 4.2 is dedicated. Chapter 5 shows the results of such analysis: section 5.1 discusses some quantitative aspects of Italian translations as the average period length or the semantic distance, and section 5.2 considers in detail the distributional similarities between the selected words in Ancient Greek and Italian texts. Finally, sections 5.3 and 5.4 examine some polysemy issues related to translation as the ways various multivocal words present in Homer were translated over time

    Detection and Aptness: A study in metaphor detection and aptness assessment through neural networks and distributional semantic spaces

    No full text
    Metaphor is one of the most prominent, and most studied, figures of speech. While it is considered an element of great interest in several branches of linguistics, such as semantics, pragmatics and stylistics, its automatic processing remains an open challenge. First of all, the semantic complexity of the concept of metaphor itself creates a range of theoretical complications. Secondly, the practical lack of large scale resources for machine learning approaches forces researchers to work under conditions of data scarcity. This compilation thesis provides a set of experiments to (i) automatically detect metaphors and (ii) assess a metaphor's aptness with respect to a given literal equivalent. The first task has already been tackled by a number of studies. We approach it as a way to assess the potentialities and limitations of our approach, before dealing with the second task. For metaphor detection we were able to use existing resources, while we created our own dataset to explore metaphor aptness assessment, which constitutes the most innovative part of this research. In all of the studies presented here, I have used a combination of word embeddings and neural networks. This combination appears particularly effective since pre-trained word embeddings can provide the networks with information necessary to deal with metaphors under conditions of data scarcity. To deal with metaphor aptness assessment, we frame the problem as a case of paraphrase identification. Given a sentence containing a metaphor, the task is to find the best literal paraphrase from a set of candidates. We build a dataset designed for this task, that allows a gradient scoring of various paraphrases with respect to a reference sentence, so that paraphrases are ordered according to their degree of aptness. Therefore, we can use it for both binary classification and ordering tasks. This dataset is annotated through crowd sourcing by an average of 20 annotators for each pair. We then design a deep neural network to be trained on this dataset. We show that its architecture is able achieve encouraging levels of performance, despite the serious limitations of data scarcity in which it is applied. In the final experiment of this compilation, more context is added to a sub-section of the dataset in order to study the effect of extended context on metaphor aptness rating. We show that extended context changes human perception of metaphor aptness and that this effect is reproduced by our neural classifier. The conclusion of the last study is that extended context compresses aptness scores towards the center of the scale, raising low ratings and decreasing high ratings given to paraphrase candidates outside of any context

    Hybrid Language Segmentation for Historical Documents

    No full text
    Abstract English. Language segmentation, i.e. the division of a multilingual text into monolingual fragments has been addressed in the past, but its application to historical documents has been largely unexplored. We propose a method for language segmentation for multilingual historical documents. For documents that contain a mix of high-and low-resource languages, we leverage the high availability of highresource language material and use unsupervised methods for the low-resource parts. We show that our method outperforms previous efforts in this field. Italian

    Comparaison automatique de traductions françaises et italiennes de l'Odyssée

    No full text
    International audienc
    corecore